Bayesian GLMM Part4

Author

Murray Logan

Published

06/07/2025

1 Preparations

Load the necessary libraries

library(tidyverse) # for data wrangling
library(car) # for regression diagnostics
library(broom) # for tidy output
library(ggfortify) # for model diagnostics
library(knitr) # for kable
library(emmeans) # for estimating marginal means
library(MASS) # for glm.nb
library(brms)
library(broom.mixed)
library(tidybayes)
library(bayesplot)
library(standist) # for visualizing distributions
library(rstanarm)
library(cmdstanr) # for cmdstan
library(ggeffects)
library(rstan)
library(DHARMa)
library(ggridges)
library(easystats) # framework for stats, modelling and visualisation
library(patchwork)
library(modelsummary)
source("helperFunctions.R")

2 Scenario

Figure 1: Crab_shrimp_coral

To investigate synergistic coral defence by mutualist crustaceans, Mckeon et al. (2012) conducted an aquaria experiment in which colonies of a coral species were placed in a tank along with a predatory sea star and one of four symbiont combinations:

  • no symbiont,
  • a crab symbiont
  • a shrimp symbiont
  • both a crab and shrimp symbiont.

The experiments were conducted in a large octagonal flow-through seawater tank that was partitioned into eight sections, which thereby permitted two of each of the four symbiont combinations to be observed concurrently. The tank was left overnight and in the morning, the presence of feeding scars on each coral colony was scored as evidence of predation. The experiments were repeated ten times, each time with fresh coral colonies, sea stars and symbiont.

The ten experimental times represent blocks (random effects) within which the symbiont type (fixed effect) are nested.

3 Read in the data

mckeon <- read_csv("../data/mckeon.csv", trim_ws = TRUE)
Rows: 80 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): SYMBIONT
dbl (2): BLOCK, PREDATION

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(mckeon)
Rows: 80
Columns: 3
$ BLOCK     <dbl> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10…
$ PREDATION <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, …
$ SYMBIONT  <chr> "none", "none", "none", "none", "none", "none", "none", "non…
## Explore the first 6 rows of the data
head(mckeon)
# A tibble: 6 × 3
  BLOCK PREDATION SYMBIONT
  <dbl>     <dbl> <chr>   
1     1         0 none    
2     1         1 none    
3     2         1 none    
4     2         1 none    
5     3         1 none    
6     3         1 none    
str(mckeon)
spc_tbl_ [80 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ BLOCK    : num [1:80] 1 1 2 2 3 3 4 4 5 5 ...
 $ PREDATION: num [1:80] 0 1 1 1 1 1 1 1 1 1 ...
 $ SYMBIONT : chr [1:80] "none" "none" "none" "none" ...
 - attr(*, "spec")=
  .. cols(
  ..   BLOCK = col_double(),
  ..   PREDATION = col_double(),
  ..   SYMBIONT = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
mckeon |> datawizard::data_codebook()
mckeon (80 rows and 3 variables, 3 shown)

ID | Name      | Type      | Missings |  Values |          N
---+-----------+-----------+----------+---------+-----------
1  | BLOCK     | numeric   | 0 (0.0%) | [1, 10] |         80
---+-----------+-----------+----------+---------+-----------
2  | PREDATION | numeric   | 0 (0.0%) |       0 | 30 (37.5%)
   |           |           |          |       1 | 50 (62.5%)
---+-----------+-----------+----------+---------+-----------
3  | SYMBIONT  | character | 0 (0.0%) |    both | 20 (25.0%)
   |           |           |          |   crabs | 20 (25.0%)
   |           |           |          |    none | 20 (25.0%)
   |           |           |          |  shrimp | 20 (25.0%)
------------------------------------------------------------
mckeon |> modelsummary::datasummary_skim()
Unique Missing Pct. Mean SD Min Median Max Histogram
BLOCK 10 0 5.5 2.9 1.0 5.5 10.0
PREDATION 2 0 0.6 0.5 0.0 1.0 1.0
SYMBIONT N %
both 20 25.0
crabs 20 25.0
none 20 25.0
shrimp 20 25.0
mckeon |> modelsummary::datasummary_skim(by = c("SYMBIONT"))
SYMBIONT Unique Missing Pct. Mean SD Min Median Max Histogram
BLOCK none 10 0 5.5 2.9 1.0 5.5 10.0
shrimp 10 0 5.5 2.9 1.0 5.5 10.0
crabs 10 0 5.5 2.9 1.0 5.5 10.0
both 10 0 5.5 2.9 1.0 5.5 10.0
PREDATION none 2 0 0.9 0.3 0.0 1.0 1.0
shrimp 2 0 0.6 0.5 0.0 1.0 1.0
crabs 2 0 0.6 0.5 0.0 1.0 1.0
both 2 0 0.4 0.5 0.0 0.0 1.0
SYMBIONT N %
both 20 25.0
crabs 20 25.0
none 20 25.0
shrimp 20 25.0

Since the response here is the presence or absence of predation (feeding scars), a binomial distribution is appropriate.

We need to make sure that the categorical variable and the random effect are defined as factors. When doing so, it might be valuable to rearrange the order of the fixed effect (SYMBIONT) such that the none group is considered the first group. This way, the other levels will all naturally be compared to this level (hence it will be treated as a reference of control group).

mckeon <- mckeon %>%
  mutate(
    BLOCK = factor(BLOCK),
    SYMBIONT = factor(SYMBIONT, levels = c("none", "crabs", "shrimp", "both"))
  )

4 Exploratory data analysis

Model formula: \[ y_i \sim{} \mathcal{N}(n, p_i)\\ ln\left(\frac{p_i}{1-p_1}\right) =\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i} \]

where \(\boldsymbol{\beta}\) and \(\boldsymbol{\gamma}\) are vectors of the fixed and random effects parameters respectively and \(\bf{X}\) is the model matrix representing the overall intercept and effects of symbionts on the probability of the colony experiencing predation. \(\bf{Z}\) represents a cell means model matrix for the random intercepts associated with individual coral colonies.

ggplot(mckeon, aes(y = PREDATION, x = SYMBIONT)) +
  geom_point(position = position_jitter(width = 0.2, height = 0)) +
  facet_wrap(~BLOCK)

5 Fit the model

In rstanarm, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over-fitting) and help stabilise the computations.

mckeon.rstanarm <- stan_glmer(PREDATION ~ SYMBIONT + (1 | BLOCK),
  data = mckeon,
  family = binomial(link = "logit"),
  iter = 5000,
  warmup = 2000,
  chains = 3,
  thin = 5,
  refresh = 0,
  cores = 3
)
mckeon.rstanarm %>% prior_summary()
Priors for model '.' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0], scale = [2.5,2.5,2.5])
  Adjusted prior:
    ~ normal(location = [0,0,0], scale = [5.74,5.74,5.74])

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details

This tells us:

  • for the intercept (when the family is binomial), a normal prior with a mean of 0 and a standard deviation of 2.5 is used. These are the default priors for bernoulli models. A mean of 0 on a logit scale corresponds to a probability of 0.5. Hence the prior expected value is 0.5.

  • for the coefficients (in this case, just the slope), the default prior is a normal prior centred around 0 with a standard deviation of 2.5. For binomial models, this is then adjusted for the scale of the data by dividing the 2.5 by the standard deviation of the predictor (then rounded).

model.matrix(~SYMBIONT, data = mckeon) %>%
  apply(2, sd) %>%
  (function(x) 2.5 / x)
   (Intercept)  SYMBIONTcrabs SYMBIONTshrimp   SYMBIONTboth 
           Inf       5.737305       5.737305       5.737305 

Lets now run with priors only so that we can explore the range of values they allow in the posteriors.

mckeon.rstanarm1 <- update(mckeon.rstanarm, prior_PD = TRUE)
mckeon.rstanarm1 %>%
  ggpredict() %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))
You are calculating adjusted predictions on the population-level (i.e.
  `type = "fixed"`) for a *generalized* linear mixed model.
  This may produce biased estimates due to Jensen's inequality. Consider
  setting `bias_correction = TRUE` to correct for this bias.
  See also the documentation of the `bias_correction` argument.
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.

Conclusions:

  • it is very difficult to assess the priors from the predicted posteriors when the posteriors are on the response scale as they will always approach values of 0 and 1 (no matter how wide they are on the link scale).

Interestingly, if we instead use ggemmeans, it will (perhaps erroneously) generate the partial effects on the link scale (yet with an incorrectly labelled y-axis). Probability scale values of 0.99 and 0.01 (very large and small respectively) correspond to value of approximately 4.6 and -4.6 respectively on the logit scale.

In the following partial plot, the routine attempts to convert the y-axes tick marks into percentages. Hence a value of 0.1 is converted to 10%. Consequently, values of -5 and 5 are labelled as -500% and 500% respectively. We know that on the probability scale, values must asymptote towards 0 and 1. Posterior intervals beyond -5 and 5 might be considered wide.

mckeon.rstanarm1 %>%
  ggemmeans(~SYMBIONT) %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))

Alternatively, we could create the plot ourselves…

mckeon.rstanarm1 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  as.data.frame() %>%
  ggplot(aes(y = emmean, x = SYMBIONT)) +
  geom_hline(yintercept = c(5, -5), linetype = "dashed") +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) +
  geom_point(
    data = mckeon, aes(y = PREDATION),
    position = position_jitter(width = 0.2, height = 0),
    alpha = 0.4, color = "red"
  )

The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]

When defining our own priors, we typically do not want them to be scaled.

If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):

  • \(\beta_0\): normal centred at 0 (since this is 0.5 on the logit scale) with a standard deviation of 2
    • mean of 0: since logit(0.5)
    • alternatively, an argument could be made for a mean of 0.51: since logit(mean(mckeon$PREDATION))
    • sd of 2: since this not too large on logit scale
  • \(\beta_1\): normal centred at 0 with a standard deviation of 0.5
    • sd of 0.5: since model.matrix(~SYMBIONT, data=mckeon)[,-1] %>% apply(2,sd)
  • \(\Sigma\): decov with:
    • regularisation: the exponent for a LKJ prior on the correlation matrix. A value of 1 (default) implies a joint uniform prior
    • concentration: the concentration parameter for a symmetric Dirichlet distribution. A value of 1 (default) implies a joint uniform distribution
    • shape and scale: the shape and scale parameters for a gamma prior on the scale and scale parameters of the decov prior. A value of 1 for both (default) simplifies the gamma prior to a unit-exponential distribution.

I will also overlay the raw data for comparison.

mckeon.rstanarm2 <- stan_glmer(PREDATION ~ SYMBIONT + (1 | BLOCK),
  data = mckeon,
  family = binomial(link = "logit"),
  prior_intercept = normal(0, 2.5, autoscale = FALSE),
  prior = normal(0, 6, autoscale = FALSE),
  prior_covariance = decov(1, 1, 1, 1),
  prior_PD = TRUE,
  iter = 5000,
  warmup = 1000,
  chains = 3,
  thin = 5,
  refresh = 0
)
mckeon.rstanarm2 %>%
  ggpredict(~SYMBIONT) %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.

mckeon.rstanarm2 %>%
  ggemmeans(~SYMBIONT) %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))

Now lets refit, conditioning on the data.

mckeon.rstanarm3 <- update(mckeon.rstanarm2, prior_PD = FALSE)
posterior_vs_prior(mckeon.rstanarm3,
  color_by = "vs", group_by = TRUE,
  facet_args = list(scales = "free_y")
)

Drawing from prior...

Conclusions:

  • in each case, the prior is substantially wider than the posterior, suggesting that the posterior is not biased towards the prior.
ggemmeans(mckeon.rstanarm3, ~SYMBIONT) %>% plot(show_data = TRUE)
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

ggpredict(mckeon.rstanarm3, ~SYMBIONT) %>% plot(show_data = TRUE)
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

In brms, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over fitting) and help stabilise the computations.

Unlike rstanarm, brms models must be compiled before they start sampling. For most models, the compilation of the stan code takes around 45 seconds.

mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (1|BLOCK),
                  family=binomial(link='logit'))
options(width=150)
mckeon.form %>% get_prior(data = mckeon)
                prior     class           coef group resp dpar nlpar lb ub       source
               (flat)         b                                                 default
               (flat)         b   SYMBIONTboth                             (vectorized)
               (flat)         b  SYMBIONTcrabs                             (vectorized)
               (flat)         b SYMBIONTshrimp                             (vectorized)
 student_t(3, 0, 2.5) Intercept                                                 default
 student_t(3, 0, 2.5)        sd                                       0         default
 student_t(3, 0, 2.5)        sd                BLOCK                  0    (vectorized)
 student_t(3, 0, 2.5)        sd      Intercept BLOCK                  0    (vectorized)
options(width=80)

The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]

When defining our own priors, we typically do not want them to be scaled.

If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):

  • \(\beta_0\): normal centred at 0 (since this is 0.5 on the logit scale) with a standard deviation of 2.5
    • mean of 0: since logit(0.5)
    • sd of 2: since this not too large on logit scale
    • alternatively, we could potentially use logistic(0,1)
  • \(\beta_1\): normal centred at 0 with a standard deviation of 6
    • sd of 6: since 2.5/model.matrix(~SYMBIONT, data=mckeon) %>% apply(2,sd)
  • \(\sigma_j\): half-cauchy with parameters 0 and 2.
  • \(\Sigma\): decov with:
    • regularisation: the exponent for a LKJ prior on the correlation matrix. A value of 1 (default) implies a joint uniform prior
    • concentration: the concentration parameter for a symmetric Dirichlet distribution. A value of 1 (default) implies a joint uniform distribution
    • shape and scale: the shape and scale parameters for a gamma prior on the scale and scale parameters of the decov prior. A value of 1 for both (default) simplifies the gamma prior to a unit-exponential distribution.

Note, for hierarchical models, the model will tend to want to have a large \(sigma\) in order to fit the data better. It is a good idea to regularise this tendency by applying a prior that has most mass around zero. Suitable candidates include:

  • half-t: as the degrees of freedom approach infinity, this will approach a half-normal
  • half-cauchy: this is essentially a half-t with 1 degree of freedom
  • exponential

I will also overlay the raw data for comparison.

mckeon |>
  group_by(SYMBIONT) |>
  summarise(
    Mean = logit(mean(PREDATION)),
    Median = logit(median(PREDATION)),
    sd = logit(abs(sd(PREDATION)))
  )
# A tibble: 4 × 4
  SYMBIONT   Mean Median      sd
  <fct>     <dbl>  <dbl>   <dbl>
1 none      2.20     Inf -0.810 
2 crabs     0.405    Inf  0.0105
3 shrimp    0.201    Inf  0.0417
4 both     -0.201   -Inf  0.0417
2.5 / model.matrix(~SYMBIONT, data = mckeon) %>% apply(2, sd)
   (Intercept)  SYMBIONTcrabs SYMBIONTshrimp   SYMBIONTboth 
           Inf       5.737305       5.737305       5.737305 
standist::visualize("student_t(3, 0, 2.5)",
  "gamma(2,0.5)",
  "cauchy(0,2)",
  xlim = c(-10, 25)
)

mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (1 | BLOCK),
  family = binomial(link = "logit")
)
get_prior(mckeon.form, data = mckeon)
                prior     class           coef group resp dpar nlpar lb ub
               (flat)         b                                           
               (flat)         b   SYMBIONTboth                            
               (flat)         b  SYMBIONTcrabs                            
               (flat)         b SYMBIONTshrimp                            
 student_t(3, 0, 2.5) Intercept                                           
 student_t(3, 0, 2.5)        sd                                       0   
 student_t(3, 0, 2.5)        sd                BLOCK                  0   
 student_t(3, 0, 2.5)        sd      Intercept BLOCK                  0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
      default
      default
 (vectorized)
 (vectorized)
priors <-
  prior(normal(0, 1.5), class = "Intercept") +
  prior(normal(0, 3), class = "b") +
  prior(student_t(3, 0, 1.5), class = "sd")

mckeon.brm2 <- brm(mckeon.form,
  data = mckeon,
  prior = priors,
  sample_prior = "only",
  iter = 5000,
  warmup = 2500,
  chains = 3, cores = 3,
  thin = 5,
  control = list(adapt_delta = 0.99, max_treedepth = 20),
  refresh = 0,
  backend = "rstan"
)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
mckeon.brm2 %>%
  ggpredict(~SYMBIONT) %>%
  plot(show_data = TRUE)
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

mckeon.brm2 %>%
  ggemmeans(~SYMBIONT) %>%
  plot(show_data = TRUE)
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

mckeon.brm2 |>
  conditional_effects() |>
  plot(points = TRUE)

mckeon.brm2 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  as.data.frame() %>%
  ggplot(aes(y = emmean, x = SYMBIONT)) +
  geom_hline(yintercept = c(5, -5), linetype = "dashed") +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) +
  geom_point(
    data = mckeon, aes(y = PREDATION),
    position = position_jitter(width = 0.2, height = 0),
    alpha = 0.4, color = "red"
  )
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

Now lets refit, conditioning on the data.

mckeon.brm3 <- update(mckeon.brm2,
  sample_prior = "yes",
  cores = 3,
  refresh = 0
)
The desired updates require recompiling the model
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
mckeon.brm3 %>%
  ggpredict(~SYMBIONT) %>%
  plot(show_data = TRUE)
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

mckeon.brm3 %>%
  ggemmeans(~SYMBIONT) %>%
  plot(show_data = TRUE)
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

mckeon.brm3 |>
  conditional_effects() |>
  plot(points = TRUE)

mckeon.brm3 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  as.data.frame() %>%
  ggplot(aes(y = emmean, x = SYMBIONT)) +
  geom_hline(yintercept = c(5, -5), linetype = "dashed") +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD)) +
  geom_point(
    data = mckeon, aes(y = PREDATION),
    position = position_jitter(width = 0.2, height = 0),
    alpha = 0.4, color = "red"
  )
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.

mckeon.brm3 %>% get_variables()
 [1] "b_Intercept"           "b_SYMBIONTcrabs"       "b_SYMBIONTshrimp"     
 [4] "b_SYMBIONTboth"        "sd_BLOCK__Intercept"   "Intercept"            
 [7] "r_BLOCK[1,Intercept]"  "r_BLOCK[2,Intercept]"  "r_BLOCK[3,Intercept]" 
[10] "r_BLOCK[4,Intercept]"  "r_BLOCK[5,Intercept]"  "r_BLOCK[6,Intercept]" 
[13] "r_BLOCK[7,Intercept]"  "r_BLOCK[8,Intercept]"  "r_BLOCK[9,Intercept]" 
[16] "r_BLOCK[10,Intercept]" "prior_Intercept"       "prior_b"              
[19] "prior_sd_BLOCK"        "lprior"                "lp__"                 
[22] "accept_stat__"         "stepsize__"            "treedepth__"          
[25] "n_leapfrog__"          "divergent__"           "energy__"             
mckeon.brm3 %>%
  hypothesis("SYMBIONTcrabs=0") %>%
  plot()

mckeon.brm3 %>%
  hypothesis("SYMBIONTshrimp=0") %>%
  plot()

mckeon.brm3 |> SUYR_prior_and_posterior()

While we are here, we might like to explore a random intercept/slope model

mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (SYMBIONT | BLOCK),
  family = binomial(link = "logit")
)
get_prior(mckeon.form, mckeon)
                prior     class           coef group resp dpar nlpar lb ub
               (flat)         b                                           
               (flat)         b   SYMBIONTboth                            
               (flat)         b  SYMBIONTcrabs                            
               (flat)         b SYMBIONTshrimp                            
               lkj(1)       cor                                           
               lkj(1)       cor                BLOCK                      
 student_t(3, 0, 2.5) Intercept                                           
 student_t(3, 0, 2.5)        sd                                       0   
 student_t(3, 0, 2.5)        sd                BLOCK                  0   
 student_t(3, 0, 2.5)        sd      Intercept BLOCK                  0   
 student_t(3, 0, 2.5)        sd   SYMBIONTboth BLOCK                  0   
 student_t(3, 0, 2.5)        sd  SYMBIONTcrabs BLOCK                  0   
 student_t(3, 0, 2.5)        sd SYMBIONTshrimp BLOCK                  0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
      default
 (vectorized)
      default
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
priors <-
  prior(normal(0, 1.5), class = "Intercept") +
  prior(normal(0, 3), class = "b") +
  prior(student_t(3, 0, 1.5), class = "sd", coef = "Intercept", group = "BLOCK") +
  prior(student_t(3, 0, 2), class = "sd") +
  prior(lkj_corr_cholesky(1), class = "cor")

mckeon.brm4 <- brm(mckeon.form,
  data = mckeon,
  prior = priors,
  sample_prior = "yes",
  iter = 5000,
  warmup = 2500,
  chains = 3, cores = 3,
  thin = 5,
  refresh = 0,
  control = list(adapt_delta = 0.99, max_treedepth = 20),
  backend = "rstan"
)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
mckeon.brm4 %>% get_variables()
 [1] "b_Intercept"                             
 [2] "b_SYMBIONTcrabs"                         
 [3] "b_SYMBIONTshrimp"                        
 [4] "b_SYMBIONTboth"                          
 [5] "sd_BLOCK__Intercept"                     
 [6] "sd_BLOCK__SYMBIONTcrabs"                 
 [7] "sd_BLOCK__SYMBIONTshrimp"                
 [8] "sd_BLOCK__SYMBIONTboth"                  
 [9] "cor_BLOCK__Intercept__SYMBIONTcrabs"     
[10] "cor_BLOCK__Intercept__SYMBIONTshrimp"    
[11] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp"
[12] "cor_BLOCK__Intercept__SYMBIONTboth"      
[13] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth"  
[14] "cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth" 
[15] "Intercept"                               
[16] "r_BLOCK[1,Intercept]"                    
[17] "r_BLOCK[2,Intercept]"                    
[18] "r_BLOCK[3,Intercept]"                    
[19] "r_BLOCK[4,Intercept]"                    
[20] "r_BLOCK[5,Intercept]"                    
[21] "r_BLOCK[6,Intercept]"                    
[22] "r_BLOCK[7,Intercept]"                    
[23] "r_BLOCK[8,Intercept]"                    
[24] "r_BLOCK[9,Intercept]"                    
[25] "r_BLOCK[10,Intercept]"                   
[26] "r_BLOCK[1,SYMBIONTcrabs]"                
[27] "r_BLOCK[2,SYMBIONTcrabs]"                
[28] "r_BLOCK[3,SYMBIONTcrabs]"                
[29] "r_BLOCK[4,SYMBIONTcrabs]"                
[30] "r_BLOCK[5,SYMBIONTcrabs]"                
[31] "r_BLOCK[6,SYMBIONTcrabs]"                
[32] "r_BLOCK[7,SYMBIONTcrabs]"                
[33] "r_BLOCK[8,SYMBIONTcrabs]"                
[34] "r_BLOCK[9,SYMBIONTcrabs]"                
[35] "r_BLOCK[10,SYMBIONTcrabs]"               
[36] "r_BLOCK[1,SYMBIONTshrimp]"               
[37] "r_BLOCK[2,SYMBIONTshrimp]"               
[38] "r_BLOCK[3,SYMBIONTshrimp]"               
[39] "r_BLOCK[4,SYMBIONTshrimp]"               
[40] "r_BLOCK[5,SYMBIONTshrimp]"               
[41] "r_BLOCK[6,SYMBIONTshrimp]"               
[42] "r_BLOCK[7,SYMBIONTshrimp]"               
[43] "r_BLOCK[8,SYMBIONTshrimp]"               
[44] "r_BLOCK[9,SYMBIONTshrimp]"               
[45] "r_BLOCK[10,SYMBIONTshrimp]"              
[46] "r_BLOCK[1,SYMBIONTboth]"                 
[47] "r_BLOCK[2,SYMBIONTboth]"                 
[48] "r_BLOCK[3,SYMBIONTboth]"                 
[49] "r_BLOCK[4,SYMBIONTboth]"                 
[50] "r_BLOCK[5,SYMBIONTboth]"                 
[51] "r_BLOCK[6,SYMBIONTboth]"                 
[52] "r_BLOCK[7,SYMBIONTboth]"                 
[53] "r_BLOCK[8,SYMBIONTboth]"                 
[54] "r_BLOCK[9,SYMBIONTboth]"                 
[55] "r_BLOCK[10,SYMBIONTboth]"                
[56] "prior_Intercept"                         
[57] "prior_b"                                 
[58] "prior_sd_BLOCK__Intercept"               
[59] "prior_sd_BLOCK__SYMBIONTcrabs"           
[60] "prior_sd_BLOCK__SYMBIONTshrimp"          
[61] "prior_sd_BLOCK__SYMBIONTboth"            
[62] "prior_cor_BLOCK"                         
[63] "lprior"                                  
[64] "lp__"                                    
[65] "accept_stat__"                           
[66] "stepsize__"                              
[67] "treedepth__"                             
[68] "n_leapfrog__"                            
[69] "divergent__"                             
[70] "energy__"                                
mckeon.brm4 %>%
  hypothesis("SYMBIONTcrabs=0") %>%
  plot()

mckeon.brm4 %>%
  hypothesis("SYMBIONTshrimp=0") %>%
  plot()

mckeon.brm4 %>% SUYR_prior_and_posterior()

mckeon.brm3 %>% SUYR_prior_and_posterior()

mckeon.brm4 %>%
  conditional_effects() %>%
  plot(points = TRUE)

mckeon.brm3 %>%
  conditional_effects() %>%
  plot(points = TRUE)

mckeon.brm4 |> summary()
 Family: binomial 
  Links: mu = logit 
Formula: PREDATION | trials(1) ~ SYMBIONT + (SYMBIONT | BLOCK) 
   Data: mckeon (Number of observations: 80) 
  Draws: 3 chains, each with iter = 5000; warmup = 2500; thin = 5;
         total post-warmup draws = 1500

Multilevel Hyperparameters:
~BLOCK (Number of levels: 10) 
                                  Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                         1.62      1.10     0.09     4.18 1.00
sd(SYMBIONTcrabs)                     4.84      4.48     0.32    15.86 1.00
sd(SYMBIONTshrimp)                    3.50      2.83     0.22    11.40 1.00
sd(SYMBIONTboth)                      3.62      3.04     0.23    11.88 1.00
cor(Intercept,SYMBIONTcrabs)          0.29      0.39    -0.56     0.89 1.00
cor(Intercept,SYMBIONTshrimp)         0.25      0.41    -0.64     0.88 1.00
cor(SYMBIONTcrabs,SYMBIONTshrimp)     0.49      0.36    -0.42     0.94 1.00
cor(Intercept,SYMBIONTboth)           0.20      0.42    -0.63     0.86 1.00
cor(SYMBIONTcrabs,SYMBIONTboth)       0.40      0.37    -0.52     0.90 1.00
cor(SYMBIONTshrimp,SYMBIONTboth)      0.43      0.39    -0.56     0.94 1.00
                                  Bulk_ESS Tail_ESS
sd(Intercept)                          878     1220
sd(SYMBIONTcrabs)                     1078     1226
sd(SYMBIONTshrimp)                     927     1280
sd(SYMBIONTboth)                      1103     1196
cor(Intercept,SYMBIONTcrabs)          1016     1145
cor(Intercept,SYMBIONTshrimp)         1270     1316
cor(SYMBIONTcrabs,SYMBIONTshrimp)     1216     1328
cor(Intercept,SYMBIONTboth)           1091     1161
cor(SYMBIONTcrabs,SYMBIONTboth)        950     1343
cor(SYMBIONTshrimp,SYMBIONTboth)      1245     1413

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          2.59      0.97     1.00     4.71 1.00     1175     1420
SYMBIONTcrabs     -1.17      1.78    -4.49     2.66 1.00     1283     1493
SYMBIONTshrimp    -1.95      1.62    -5.06     1.50 1.00     1346     1312
SYMBIONTboth      -3.06      1.53    -6.05     0.10 1.00     1436     1541

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mckeon.brm4 %>%
  posterior_samples() %>%
  dplyr::select(-`lp__`) %>%
  pivot_longer(everything(), names_to = "key") %>%
  filter(!str_detect(key, "^r")) %>%
  mutate(
    Type = ifelse(str_detect(key, "prior"), "Prior", "Posterior"),
    ## Class = ifelse(str_detect(key, 'Intercept'),  'Intercept',
    ##         ifelse(str_detect(key, 'b'),  'b', 'sigma')),
    Class = case_when(
      str_detect(key, "(^b|^prior).*Intercept$") ~ "Intercept",
      str_detect(key, "b_SYMBIONT.*|prior_b_SYMBIONT.*") &
        !str_detect(key, ".*:.*") ~ "SYMBIONT",
      str_detect(key, "sd") ~ "sd",
      str_detect(key, "^cor|prior_cor") ~ "cor",
      str_detect(key, "sigma") ~ "sigma"
    ),
    Par = str_replace(key, "b_", "")
  ) %>%
  ggplot(aes(x = Type, y = value, color = Par)) +
  stat_pointinterval(position = position_dodge()) +
  facet_wrap(~Class, scales = "free")
Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.

We can compare the two models using LOO

(l.1 <- mckeon.brm3 %>% loo())
Warning: Found 1 observations with a pareto_k > 0.7 in model '.'. We recommend
to set 'moment_match = TRUE' in order to perform moment matching for
problematic observations.

Computed from 1500 by 80 log-likelihood matrix.

         Estimate   SE
elpd_loo    -28.0  6.8
p_loo         8.3  2.4
looic        55.9 13.6
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.2]).

Pareto k diagnostic values:
                          Count Pct.    Min. ESS
(-Inf, 0.69]   (good)     79    98.8%   93      
   (0.69, 1]   (bad)       1     1.2%   <NA>    
    (1, Inf)   (very bad)  0     0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
(l.2 <- mckeon.brm4 %>% loo())
Warning: Found 4 observations with a pareto_k > 0.7 in model '.'. We recommend
to set 'moment_match = TRUE' in order to perform moment matching for
problematic observations.

Computed from 1500 by 80 log-likelihood matrix.

         Estimate   SE
elpd_loo    -24.5  5.7
p_loo        10.1  2.9
looic        49.0 11.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.1]).

Pareto k diagnostic values:
                          Count Pct.    Min. ESS
(-Inf, 0.69]   (good)     76    95.0%   210     
   (0.69, 1]   (bad)       4     5.0%   <NA>    
    (1, Inf)   (very bad)  0     0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
loo_compare(l.1, l.2)
  elpd_diff se_diff
.  0.0       0.0   
. -3.4       2.2   

It appears that the random intercept/slope model is no better than the simpler random intercept model.

mckeon.brm4 %>%
  posterior_samples() %>%
  dplyr::select(-`lp__`) %>%
  pivot_longer(everything(), names_to = "key") %>%
  filter(!str_detect(key, "^r")) %>%
  mutate(
    Type = ifelse(str_detect(key, "prior"), "Prior", "Posterior"),
    Class = case_when(
      str_detect(key, "(^b|^prior).*Intercept$") ~ "Intercept",
      str_detect(key, "b_SYMBIONT.*|prior_b") ~ "TREATMENT",
      str_detect(key, "sd") ~ "sd",
      str_detect(key, "^cor|prior_cor") ~ "cor",
      str_detect(key, "sigma") ~ "sigma"
    ),
    Par = str_replace(key, "b_", "")
  ) %>%
  ggplot(aes(x = Type, y = value, color = Par)) +
  stat_pointinterval(position = position_dodge(), show.legend = FALSE) +
  facet_wrap(~Class, scales = "free")
Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.

6 MCMC sampling diagnostics

The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.

available_mcmc()
bayesplot MCMC module:
  mcmc_acf
  mcmc_acf_bar
  mcmc_areas
  mcmc_areas_ridges
  mcmc_combo
  mcmc_dens
  mcmc_dens_chains
  mcmc_dens_overlay
  mcmc_hex
  mcmc_hist
  mcmc_hist_by_chain
  mcmc_intervals
  mcmc_neff
  mcmc_neff_hist
  mcmc_nuts_acceptance
  mcmc_nuts_divergence
  mcmc_nuts_energy
  mcmc_nuts_stepsize
  mcmc_nuts_treedepth
  mcmc_pairs
  mcmc_parcoord
  mcmc_rank_ecdf
  mcmc_rank_hist
  mcmc_rank_overlay
  mcmc_recover_hist
  mcmc_recover_intervals
  mcmc_recover_scatter
  mcmc_rhat
  mcmc_rhat_hist
  mcmc_scatter
  mcmc_trace
  mcmc_trace_highlight
  mcmc_violin

Of these, we will focus on:

  • trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
pars <- mckeon.brm3 |> get_variables()
pars <- pars %>%
  str_extract("^b.Intercept|^b_SYMBIONT.*|[sS]igma|^sd.*") %>%
  na.omit()
pars
[1] "b_Intercept"         "b_SYMBIONTcrabs"     "b_SYMBIONTshrimp"   
[4] "b_SYMBIONTboth"      "sd_BLOCK__Intercept"
attr(,"na.action")
 [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
attr(,"class")
[1] "omit"
mckeon.brm3 %>% mcmc_plot(type = "trace", variables = pars)
Warning: The following arguments were unrecognized and ignored: variables
No divergences to plot.

# OR
mckeon.brm3 %>% mcmc_plot(
  type = "trace",
  variable = "^b.Intercept|^b_SYMBIONT.*|[sS]igma|^sd.*",
  regex = TRUE
)
No divergences to plot.

The chains appear well mixed and very similar

  • acf_bar (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain
mckeon.brm3 %>% mcmc_plot(type = "acf_bar", variable = pars)

## OR
mckeon.brm3 %>% mcmc_plot(
  type = "acf_bar",
  variable = "^b.Intercept|^b_SYMBIONT.*|[sS]igma|^sd.*",
  regex = TRUE
)

There is no evidence of auto-correlation in the MCMC samples

  • rhat_hist: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
mckeon.brm3 %>% mcmc_plot(type = "rhat_hist")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

  • neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

    If the ratios are low, tightening the priors may help.

mckeon.brm3 %>% mcmc_plot(type = "neff_hist")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

More diagnostics
mckeon.brm3 %>% mcmc_plot(type = "combo", variable = pars)

mckeon.brm3 %>% mcmc_plot(type = "violin", variable = pars)

The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.

Of these, we will focus on:

  • stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
mckeon.brm4 %>% get_variables()
 [1] "b_Intercept"                             
 [2] "b_SYMBIONTcrabs"                         
 [3] "b_SYMBIONTshrimp"                        
 [4] "b_SYMBIONTboth"                          
 [5] "sd_BLOCK__Intercept"                     
 [6] "sd_BLOCK__SYMBIONTcrabs"                 
 [7] "sd_BLOCK__SYMBIONTshrimp"                
 [8] "sd_BLOCK__SYMBIONTboth"                  
 [9] "cor_BLOCK__Intercept__SYMBIONTcrabs"     
[10] "cor_BLOCK__Intercept__SYMBIONTshrimp"    
[11] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp"
[12] "cor_BLOCK__Intercept__SYMBIONTboth"      
[13] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth"  
[14] "cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth" 
[15] "Intercept"                               
[16] "r_BLOCK[1,Intercept]"                    
[17] "r_BLOCK[2,Intercept]"                    
[18] "r_BLOCK[3,Intercept]"                    
[19] "r_BLOCK[4,Intercept]"                    
[20] "r_BLOCK[5,Intercept]"                    
[21] "r_BLOCK[6,Intercept]"                    
[22] "r_BLOCK[7,Intercept]"                    
[23] "r_BLOCK[8,Intercept]"                    
[24] "r_BLOCK[9,Intercept]"                    
[25] "r_BLOCK[10,Intercept]"                   
[26] "r_BLOCK[1,SYMBIONTcrabs]"                
[27] "r_BLOCK[2,SYMBIONTcrabs]"                
[28] "r_BLOCK[3,SYMBIONTcrabs]"                
[29] "r_BLOCK[4,SYMBIONTcrabs]"                
[30] "r_BLOCK[5,SYMBIONTcrabs]"                
[31] "r_BLOCK[6,SYMBIONTcrabs]"                
[32] "r_BLOCK[7,SYMBIONTcrabs]"                
[33] "r_BLOCK[8,SYMBIONTcrabs]"                
[34] "r_BLOCK[9,SYMBIONTcrabs]"                
[35] "r_BLOCK[10,SYMBIONTcrabs]"               
[36] "r_BLOCK[1,SYMBIONTshrimp]"               
[37] "r_BLOCK[2,SYMBIONTshrimp]"               
[38] "r_BLOCK[3,SYMBIONTshrimp]"               
[39] "r_BLOCK[4,SYMBIONTshrimp]"               
[40] "r_BLOCK[5,SYMBIONTshrimp]"               
[41] "r_BLOCK[6,SYMBIONTshrimp]"               
[42] "r_BLOCK[7,SYMBIONTshrimp]"               
[43] "r_BLOCK[8,SYMBIONTshrimp]"               
[44] "r_BLOCK[9,SYMBIONTshrimp]"               
[45] "r_BLOCK[10,SYMBIONTshrimp]"              
[46] "r_BLOCK[1,SYMBIONTboth]"                 
[47] "r_BLOCK[2,SYMBIONTboth]"                 
[48] "r_BLOCK[3,SYMBIONTboth]"                 
[49] "r_BLOCK[4,SYMBIONTboth]"                 
[50] "r_BLOCK[5,SYMBIONTboth]"                 
[51] "r_BLOCK[6,SYMBIONTboth]"                 
[52] "r_BLOCK[7,SYMBIONTboth]"                 
[53] "r_BLOCK[8,SYMBIONTboth]"                 
[54] "r_BLOCK[9,SYMBIONTboth]"                 
[55] "r_BLOCK[10,SYMBIONTboth]"                
[56] "prior_Intercept"                         
[57] "prior_b"                                 
[58] "prior_sd_BLOCK__Intercept"               
[59] "prior_sd_BLOCK__SYMBIONTcrabs"           
[60] "prior_sd_BLOCK__SYMBIONTshrimp"          
[61] "prior_sd_BLOCK__SYMBIONTboth"            
[62] "prior_cor_BLOCK"                         
[63] "lprior"                                  
[64] "lp__"                                    
[65] "accept_stat__"                           
[66] "stepsize__"                              
[67] "treedepth__"                             
[68] "n_leapfrog__"                            
[69] "divergent__"                             
[70] "energy__"                                
pars <- mckeon.brm4 %>% get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") %>% na.omit()

mckeon.brm4$fit %>%
  stan_trace(pars = pars)

mckeon.brm3$fit |> stan_trace()
'pars' not specified. Showing first 10 parameters by default.

The chains appear well mixed and very similar

  • stan_acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain
mckeon.brm4$fit %>%
  stan_ac(pars = pars)

mckeon.brm3$fit |> stan_ac()
'pars' not specified. Showing first 10 parameters by default.

There is no evidence of auto-correlation in the MCMC samples

  • stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
mckeon.brm3$fit %>% stan_rhat()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

All Rhat values are below 1.05, suggesting the chains have converged.

  • stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

    If the ratios are low, tightening the priors may help.

mckeon.brm3$fit %>% stan_ess()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ratios all very high.

mckeon.brm4$fit %>%
  stan_dens(separate_chains = TRUE, pars = pars)

The ggmean package also as a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.

Of these, we will focus on:

  • ggs_traceplot: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
## mckeon.ggs <- mckeon.brm3 %>% ggs(burnin = FALSE, inc_warmup = FALSE)
## mckeon.ggs %>% ggs_traceplot()

The chains appear well mixed and very similar

  • gss_autocorrelation (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain
## ggs_autocorrelation(mckeon.ggs)

There is no evidence of auto-correlation in the MCMC samples

  • stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
## ggs_Rhat(mckeon.ggs)

All Rhat values are below 1.05, suggesting the chains have converged.

  • stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).

    If the ratios are low, tightening the priors may help.

## ggs_effective(mckeon.ggs)

Ratios all very high.

More diagnostics
## ggs_crosscorrelation(mckeon.ggs)
## ggs_grb(mckeon.ggs)

7 Model validation

Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.

available_ppc()
bayesplot PPC module:
  ppc_bars
  ppc_bars_grouped
  ppc_boxplot
  ppc_dens
  ppc_dens_overlay
  ppc_dens_overlay_grouped
  ppc_ecdf_overlay
  ppc_ecdf_overlay_grouped
  ppc_error_binned
  ppc_error_hist
  ppc_error_hist_grouped
  ppc_error_scatter
  ppc_error_scatter_avg
  ppc_error_scatter_avg_grouped
  ppc_error_scatter_avg_vs_x
  ppc_freqpoly
  ppc_freqpoly_grouped
  ppc_hist
  ppc_intervals
  ppc_intervals_grouped
  ppc_km_overlay
  ppc_km_overlay_grouped
  ppc_loo_intervals
  ppc_loo_pit_ecdf
  ppc_loo_pit_overlay
  ppc_loo_pit_qq
  ppc_loo_ribbon
  ppc_pit_ecdf
  ppc_pit_ecdf_grouped
  ppc_ribbon
  ppc_ribbon_grouped
  ppc_rootogram
  ppc_scatter
  ppc_scatter_avg
  ppc_scatter_avg_grouped
  ppc_stat
  ppc_stat_2d
  ppc_stat_freqpoly
  ppc_stat_freqpoly_grouped
  ppc_stat_grouped
  ppc_violin_grouped
  • dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
mckeon.brm3 |> pp_check(type = "dens_overlay", nsamples = 250)
Warning: Argument 'nsamples' is deprecated. Please use argument 'ndraws'
instead.

The model draws appear to be consistent with the observed data.

  • error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. Note, this is not really that useful for models that involve a binomial response
mckeon.brm4 %>% pp_check(type = "error_scatter_avg")
Using all posterior draws for ppc type 'error_scatter_avg' by default.

This is not really interpretable

  • intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
mckeon.brm4 %>% pp_check(group = "BLOCK", type = "intervals")
Using all posterior draws for ppc type 'intervals' by default.
Warning: The following arguments were unrecognized and ignored: group

The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.

# library(shinystan)
# launch_shinystan(mckeon.brm2)

DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.

We need to supply:

  • simulated (predicted) responses associated with each observation.
  • observed values
  • fitted (predicted) responses (averaged) associated with each observation
preds <- mckeon.brm3 %>% posterior_predict(nsamples = 250, summary = FALSE)
Warning: Argument 'nsamples' is deprecated. Please use argument 'ndraws'
instead.
mckeon.resids <- createDHARMa(
  simulatedResponse = t(preds),
  observedResponse = mckeon$PREDATION,
  fittedPredictedResponse = apply(preds, 2, median),
  integerResponse = TRUE
)
plot(mckeon.resids, quantreg = TRUE)

mckeon.resids <- make_brms_dharma_res(mckeon.brm3, integerResponse = FALSE)
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(mckeon.resids)) +
  wrap_elements(~ plotResiduals(mckeon.resids, form = factor(rep(1, nrow(mckeon))))) +
  wrap_elements(~ plotResiduals(mckeon.resids, quantreg = FALSE)) +
  wrap_elements(~ testDispersion(mckeon.resids))

Conclusions:

  • the simulated residuals do not suggest any issues with the residuals
  • there is no evidence of a lack of fit.

8 Partial effects plots

mckeon.brm3 |>
  conditional_effects() |>
  plot(points = TRUE)

mckeon.brm3 %>%
  ggpredict() %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))
Note: uncertainty of error terms are not taken into account. Consider
  setting `interval` to "prediction". This will call `posterior_predict()`
  instead of `posterior_epred()`.

mckeon.brm3 %>%
  ggemmeans(~SYMBIONT) %>%
  plot(show_data = TRUE, jitter = c(0.5, 0))

## Partial residuals in binomial models are too confusing for the average viewer
## as they will yeild values that are not exactly 0 or 1 and this seems wrong.
## Partial.obs <- mckeon.brm3$data %>%
##     mutate(Pred = predict(mckeon.brm3, re.form=NA)[,'Estimate'],
##            Resid = resid(mckeon.brm3)[,'Estimate'],
##            Obs = Pred + Resid)

mckeon.brm3 %>%
  epred_draws(newdata = mckeon, re_formula = NA) %>%
  median_hdci() %>%
  ggplot(aes(x = SYMBIONT, y = .epred)) +
  geom_pointrange(aes(ymin = .lower, ymax = .upper)) +
  geom_line() +
  ## geom_point(data = Partial.obs,  aes(y = Obs,  x = SYMBIONT),
  ##            position = position_nudge(x = 0.1, y = 0), color = "red") +
  geom_point(
    data = mckeon, aes(y = PREDATION, x = SYMBIONT), alpha = 0.2,
    position = position_jitter(width = 0.2, height = 0)
  )

9 Model investigation

The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).

mckeon.brm4 %>% summary()
 Family: binomial 
  Links: mu = logit 
Formula: PREDATION | trials(1) ~ SYMBIONT + (SYMBIONT | BLOCK) 
   Data: mckeon (Number of observations: 80) 
  Draws: 3 chains, each with iter = 5000; warmup = 2500; thin = 5;
         total post-warmup draws = 1500

Multilevel Hyperparameters:
~BLOCK (Number of levels: 10) 
                                  Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                         1.62      1.10     0.09     4.18 1.00
sd(SYMBIONTcrabs)                     4.84      4.48     0.32    15.86 1.00
sd(SYMBIONTshrimp)                    3.50      2.83     0.22    11.40 1.00
sd(SYMBIONTboth)                      3.62      3.04     0.23    11.88 1.00
cor(Intercept,SYMBIONTcrabs)          0.29      0.39    -0.56     0.89 1.00
cor(Intercept,SYMBIONTshrimp)         0.25      0.41    -0.64     0.88 1.00
cor(SYMBIONTcrabs,SYMBIONTshrimp)     0.49      0.36    -0.42     0.94 1.00
cor(Intercept,SYMBIONTboth)           0.20      0.42    -0.63     0.86 1.00
cor(SYMBIONTcrabs,SYMBIONTboth)       0.40      0.37    -0.52     0.90 1.00
cor(SYMBIONTshrimp,SYMBIONTboth)      0.43      0.39    -0.56     0.94 1.00
                                  Bulk_ESS Tail_ESS
sd(Intercept)                          878     1220
sd(SYMBIONTcrabs)                     1078     1226
sd(SYMBIONTshrimp)                     927     1280
sd(SYMBIONTboth)                      1103     1196
cor(Intercept,SYMBIONTcrabs)          1016     1145
cor(Intercept,SYMBIONTshrimp)         1270     1316
cor(SYMBIONTcrabs,SYMBIONTshrimp)     1216     1328
cor(Intercept,SYMBIONTboth)           1091     1161
cor(SYMBIONTcrabs,SYMBIONTboth)        950     1343
cor(SYMBIONTshrimp,SYMBIONTboth)      1245     1413

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          2.59      0.97     1.00     4.71 1.00     1175     1420
SYMBIONTcrabs     -1.17      1.78    -4.49     2.66 1.00     1283     1493
SYMBIONTshrimp    -1.95      1.62    -5.06     1.50 1.00     1346     1312
SYMBIONTboth      -3.06      1.53    -6.05     0.10 1.00     1436     1541

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conclusions:

  • the coefficients are presented on a logit scale. Whilst this is not relevant for the purpose of inference testing, it does make it difficult to interpret the coefficients.
  • if we exponentiate the coefficients (\(log(\frac{\rho}{1-\rho})\) -> \(\frac{\rho}{1-\rho}\)), they will be presented on a odds ratio scale, and thus:
    • the intercept (none symbionts) will be 26.07. That is, corals without a symbiont are 26.07 times more likely to be predated on than not predated on. The odds of predation in this the absence of symbionts is 26.07:1.
    • in the presence of a crab symbiont, the odds of being predated on are only 0.09 times that of the none symbiont group. That is, in the presence of a crab symbiont, the odds of predation decline by 91%.
    • in the presence of a shrimp symbiont, the odds of being predated on are only 0.05 times that of the none symbiont group. That is, in the presence of a shrimp symbiont, the odds of predation decline by 95%.
    • in the presence of both crab and shrimp symbionts, the odds of being predated on are only 0.02 times that of the none symbiont group. That is, in the presence of both crab and shrimp symbiont, the odds of predation decline by 98%.
  • if we back-transform the intercept full to the response scale (probability scale), (\(log(\frac{\rho}{1-\rho})\) -> \(\rho\)), the intercept is interpreted as the probability that corals will be predated in the absence of of symbionts is 1
# A draws_df: 500 iterations, 3 chains, and 21 variables
   b_Intercept b_SYMBIONTcrabs b_SYMBIONTshrimp b_SYMBIONTboth
1          3.1           -1.02             -1.7           -3.9
2          3.7           -1.96             -3.4           -4.2
3          2.4           -1.94             -3.6           -3.9
4          6.8           -4.01             -4.1           -5.9
5          2.5           -1.01             -0.9           -4.2
6          2.3           -3.07             -2.8           -4.2
7          2.0           -0.94             -2.1           -3.2
8          2.7           -2.28             -1.9           -3.3
9          3.2           -0.69             -1.4           -3.0
10         4.6           -4.57             -4.2           -4.8
   sd_BLOCK__Intercept Intercept r_BLOCK[1,Intercept] r_BLOCK[2,Intercept]
1                  2.4     1.455                -2.64                 -1.9
2                  3.1     1.306                -3.04                 -2.7
3                  2.4     0.075                -2.66                 -1.1
4                  2.9     3.270                -7.99                 -6.5
5                  3.2     0.967                -2.54                 -3.8
6                  2.4    -0.233                -1.52                 -2.2
7                  1.9     0.386                -0.74                 -2.2
8                  2.9     0.864                -2.07                 -3.0
9                  3.1     1.891                -3.54                 -3.6
10                 2.6     1.203                -2.14                 -3.8
# ... with 1490 more draws, and 13 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
variable median lower upper Pl Pg rhat ess_bulk ess_tail
b_Intercept 25.2702351 0.9731722 190.3082932 0.0020000 0.9980000 1.0018382 1431.535 1418.517
b_SYMBIONTcrabs 0.0874509 0.0012528 0.4262177 0.9953333 0.0046667 0.9985420 1595.508 1495.416
b_SYMBIONTshrimp 0.0526094 0.0015164 0.2689144 0.9993333 0.0006667 1.0009889 1435.310 1420.802
b_SYMBIONTboth 0.0197993 0.0001903 0.1158990 1.0000000 0.0000000 0.9987386 1553.495 1438.469
sd_BLOCK__Intercept 19.1409108 2.5111760 157.4892358 0.0000000 1.0000000 0.9998692 1315.546 1421.213
Intercept 2.4749236 0.0598998 10.8641434 0.1546667 0.8453333 1.0026818 1282.698 1310.144
r_BLOCK[1,Intercept] 0.0367128 0.0000353 0.3005098 0.9973333 0.0026667 1.0004560 1473.033 1445.810
r_BLOCK[2,Intercept] 0.1051144 0.0007807 0.7664818 0.9680000 0.0320000 1.0001461 1383.241 1381.792
r_BLOCK[3,Intercept] 0.0986019 0.0006385 0.7291968 0.9760000 0.0240000 1.0014251 1551.418 1380.249
r_BLOCK[4,Intercept] 0.0985206 0.0006915 0.7716159 0.9626667 0.0373333 1.0012582 1174.690 1347.793
r_BLOCK[5,Intercept] 0.9973739 0.0239876 7.0349416 0.5013333 0.4986667 1.0016138 1487.816 1418.790
r_BLOCK[6,Intercept] 5.0411518 0.0641411 68.4521289 0.0860000 0.9140000 1.0027318 1311.281 1456.605
r_BLOCK[7,Intercept] 29.0161571 0.3791065 2313.3286778 0.0086667 0.9913333 1.0010048 1448.485 1458.808
r_BLOCK[8,Intercept] 32.0222203 0.3477107 2736.1282610 0.0073333 0.9926667 1.0015554 1390.843 1343.437
r_BLOCK[9,Intercept] 29.6876120 0.3783989 2876.9798831 0.0113333 0.9886667 0.9987866 1583.382 1496.061
r_BLOCK[10,Intercept] 5.5186370 0.1181811 63.5122820 0.0773333 0.9226667 1.0021365 1333.557 1419.092
prior_Intercept 1.0762794 0.0100164 12.1587913 0.4840000 0.5160000 1.0006792 1536.272 1496.757
prior_b 0.9864571 0.0000325 146.8549199 0.5013333 0.4986667 0.9998620 1396.200 1382.339
prior_sd_BLOCK 3.1907922 1.0022070 109.3996505 0.0000000 1.0000000 1.0028776 1284.156 1314.934
lprior 0.0000067 0.0000000 0.0000470 1.0000000 0.0000000 0.9993926 1457.594 1459.149
lp__ 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.9998432 1322.466 1457.578
Warning: Dropping 'draws_df' class as required metadata was removed.
variable median lower upper Pl Pg rhat ess_bulk ess_tail
b_Intercept 25.2702351 0.9731722 190.3082932 0.0020000 0.9980000 0.9995241 1455.669 1414.413
b_SYMBIONTcrabs 0.0874509 0.0012528 0.4262177 0.9953333 0.0046667 1.0000964 1586.762 1485.701
b_SYMBIONTshrimp 0.0526094 0.0015164 0.2689144 0.9993333 0.0006667 0.9994376 1432.237 1414.413
b_SYMBIONTboth 0.0197993 0.0001903 0.1158990 1.0000000 0.0000000 0.9996000 1540.493 1403.442
sd_BLOCK__Intercept 19.1409108 2.5111760 157.4892358 0.0000000 1.0000000 1.0001473 1307.815 1414.607
mckeon.brm3$fit %>%
  tidyMCMC(
    estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval",
    rhat = TRUE, ess = TRUE
  )
# A tibble: 20 × 7
   term                   estimate std.error  conf.low conf.high  rhat   ess
   <chr>                     <dbl>     <dbl>     <dbl>     <dbl> <dbl> <int>
 1 b_Intercept             3.26        1.17    1.09       5.62   1.00   1426
 2 b_SYMBIONTcrabs        -2.46        1.03   -4.64      -0.600  0.998  1595
 3 b_SYMBIONTshrimp       -3.00        1.07   -4.92      -0.772  1.00   1439
 4 b_SYMBIONTboth         -3.96        1.17   -6.33      -1.88   0.999  1539
 5 sd_BLOCK__Intercept     3.12        1.07    1.29       5.12   1.000  1294
 6 Intercept               0.906       0.898  -0.886      2.64   1.00   1239
 7 r_BLOCK[1,Intercept]   -3.41        1.44   -6.38      -0.839  1.00   1485
 8 r_BLOCK[2,Intercept]   -2.30        1.32   -4.87       0.286  1.000  1368
 9 r_BLOCK[3,Intercept]   -2.34        1.29   -4.91       0.0603 1.00   1539
10 r_BLOCK[4,Intercept]   -2.31        1.27   -4.77       0.173  1.00   1157
11 r_BLOCK[5,Intercept]   -0.00230     1.18   -2.12       2.53   1.00   1476
12 r_BLOCK[6,Intercept]    1.74        1.39   -0.768      4.78   1.00   1347
13 r_BLOCK[7,Intercept]    3.73        2.22    0.0202     8.04   1.00   1430
14 r_BLOCK[8,Intercept]    3.80        2.25    0.0963     8.26   1.00   1358
15 r_BLOCK[9,Intercept]    3.77        2.34    0.205      8.54   0.999  1581
16 r_BLOCK[10,Intercept]   1.81        1.35   -0.892      4.39   1.00   1315
17 prior_Intercept         0.0679      1.50   -3.09       2.78   1.00   1534
18 prior_b                 0.00358     2.95   -5.53       6.09   1.000  1390
19 prior_sd_BLOCK          1.63        1.68    0.00220    4.70   1.00   1365
20 lprior                -12.1         1.56  -15.2       -9.38   0.999  1415

Conclusions:

see above

Due to the presence of a log transform in the predictor, it is better to use the regex version.

mckeon.brm3 %>% get_variables()
 [1] "b_Intercept"           "b_SYMBIONTcrabs"       "b_SYMBIONTshrimp"     
 [4] "b_SYMBIONTboth"        "sd_BLOCK__Intercept"   "Intercept"            
 [7] "r_BLOCK[1,Intercept]"  "r_BLOCK[2,Intercept]"  "r_BLOCK[3,Intercept]" 
[10] "r_BLOCK[4,Intercept]"  "r_BLOCK[5,Intercept]"  "r_BLOCK[6,Intercept]" 
[13] "r_BLOCK[7,Intercept]"  "r_BLOCK[8,Intercept]"  "r_BLOCK[9,Intercept]" 
[16] "r_BLOCK[10,Intercept]" "prior_Intercept"       "prior_b"              
[19] "prior_sd_BLOCK"        "lprior"                "lp__"                 
[22] "accept_stat__"         "stepsize__"            "treedepth__"          
[25] "n_leapfrog__"          "divergent__"           "energy__"             
mckeon.draw <- mckeon.brm3 %>%
  gather_draws(`b.Intercept.*|b_SYMBIONT.*`, regex = TRUE)
mckeon.draw
# A tibble: 6,000 × 5
# Groups:   .variable [4]
   .chain .iteration .draw .variable   .value
    <int>      <int> <int> <chr>        <dbl>
 1      1          1     1 b_Intercept   3.11
 2      1          2     2 b_Intercept   3.69
 3      1          3     3 b_Intercept   2.43
 4      1          4     4 b_Intercept   6.77
 5      1          5     5 b_Intercept   2.50
 6      1          6     6 b_Intercept   2.28
 7      1          7     7 b_Intercept   1.95
 8      1          8     8 b_Intercept   2.74
 9      1          9     9 b_Intercept   3.16
10      1         10    10 b_Intercept   4.60
# ℹ 5,990 more rows

We can then summarise this

mckeon.draw %>% median_hdci()
# A tibble: 4 × 7
  .variable        .value .lower .upper .width .point .interval
  <chr>             <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 b_Intercept        3.23   1.09  5.62    0.95 median hdci     
2 b_SYMBIONTboth    -3.92  -6.15 -1.69    0.95 median hdci     
3 b_SYMBIONTcrabs   -2.44  -4.64 -0.600   0.95 median hdci     
4 b_SYMBIONTshrimp  -2.94  -4.92 -0.772   0.95 median hdci     
## On a odd ratio scale
mckeon.draw %>%
  mutate(.value = exp(.value)) %>%
  median_hdci()
# A tibble: 4 × 7
  .variable         .value   .lower  .upper .width .point .interval
  <chr>              <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
1 b_Intercept      25.3    0.973    190.      0.95 median hdci     
2 b_SYMBIONTboth    0.0198 0.000190   0.115   0.95 median hdci     
3 b_SYMBIONTcrabs   0.0875 0.00125    0.426   0.95 median hdci     
4 b_SYMBIONTshrimp  0.0526 0.00204    0.271   0.95 median hdci     
mckeon.brm3 %>%
  gather_draws(`b_Intercept.*|b_SYMBIONT.*`, regex = TRUE) %>%
  ggplot() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  stat_slab(aes(
    x = .value, y = .variable,
    fill = stat(ggdist::cut_cdf_qi(cdf,
      .width = c(0.5, 0.8, 0.95),
      labels = scales::percent_format()
    ))
  ), color = "black") +
  scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)
Warning: `stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95), labels =
scales::percent_format()))` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95),
  labels = scales::percent_format()))` instead.

mckeon.brm3 %>%
  gather_draws(`.Intercept.*|b_SYMBIONT.*`, regex = TRUE) %>%
  ggplot() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  stat_halfeye(aes(x = .value, y = .variable)) +
  theme_classic()

mckeon.brm3$fit %>% plot(type = "intervals")
'pars' not specified. Showing first 10 parameters by default.
ci_level: 0.8 (80% intervals)
outer_level: 0.95 (95% intervals)

mckeon.brm3 %>%
  gather_draws(`^b_.*`, regex = TRUE) %>%
  filter(.variable != "b_Intercept") %>%
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable)) +
  facet_wrap(~.variable, scales = "free")

mckeon.brm3 %>%
  gather_draws(`^b_.*`, regex = TRUE) %>%
  filter(.variable != "b_Intercept") %>%
  ggplot() +
  stat_halfeye(aes(x = .value, y = .variable)) +
  geom_vline(xintercept = 0, linetype = "dashed")

mckeon.brm3 %>%
  gather_draws(`^b_.*`, regex = TRUE) %>%
  filter(.variable != "b_Intercept") %>%
  ggplot() +
  geom_density_ridges(aes(x = .value, y = .variable), alpha = 0.4) +
  geom_vline(xintercept = 0, linetype = "dashed")
Picking joint bandwidth of 0.224

## Or in colour
mckeon.brm3 %>%
  gather_draws(`^b_.*`, regex = TRUE) %>%
  filter(.variable != "b_Intercept") %>%
  ggplot() +
  geom_density_ridges_gradient(
    aes(
      x = (.value),
      y = .variable,
      fill = stat(x)
    ),
    alpha = 0.4, colour = "white",
    quantile_lines = TRUE,
    quantiles = c(0.025, 0.975)
  ) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  scale_x_continuous() +
  scale_fill_viridis_c(option = "C")
Picking joint bandwidth of 0.224

## Fractional scale
mckeon.brm3 %>%
  gather_draws(`^b_.*`, regex = TRUE) %>%
  filter(.variable != "b_Intercept") %>%
  ggplot() +
  geom_density_ridges_gradient(
    aes(
      x = exp(.value),
      y = .variable,
      fill = stat(x)
    ),
    alpha = 0.4, colour = "white",
    quantile_lines = TRUE,
    quantiles = c(0.025, 0.975)
  ) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  scale_x_continuous(trans = scales::log2_trans()) +
  scale_fill_viridis_c(option = "C")
Picking joint bandwidth of 0.323

mckeon.brm3 %>% tidy_draws()
# A tibble: 1,500 × 30
   .chain .iteration .draw b_Intercept b_SYMBIONTcrabs b_SYMBIONTshrimp
    <int>      <int> <int>       <dbl>           <dbl>            <dbl>
 1      1          1     1        3.11          -1.02            -1.72 
 2      1          2     2        3.69          -1.96            -3.41 
 3      1          3     3        2.43          -1.94            -3.58 
 4      1          4     4        6.77          -4.01            -4.05 
 5      1          5     5        2.50          -1.01            -0.902
 6      1          6     6        2.28          -3.07            -2.80 
 7      1          7     7        1.95          -0.943           -2.15 
 8      1          8     8        2.74          -2.28            -1.91 
 9      1          9     9        3.16          -0.685           -1.39 
10      1         10    10        4.60          -4.57            -4.24 
# ℹ 1,490 more rows
# ℹ 24 more variables: b_SYMBIONTboth <dbl>, sd_BLOCK__Intercept <dbl>,
#   Intercept <dbl>, `r_BLOCK[1,Intercept]` <dbl>,
#   `r_BLOCK[2,Intercept]` <dbl>, `r_BLOCK[3,Intercept]` <dbl>,
#   `r_BLOCK[4,Intercept]` <dbl>, `r_BLOCK[5,Intercept]` <dbl>,
#   `r_BLOCK[6,Intercept]` <dbl>, `r_BLOCK[7,Intercept]` <dbl>,
#   `r_BLOCK[8,Intercept]` <dbl>, `r_BLOCK[9,Intercept]` <dbl>, …
mckeon.brm3 %>% spread_draws(`.*Intercept.*|b_SYMBIONT.*`, regex = TRUE)
# A tibble: 1,500 × 20
   .chain .iteration .draw b_Intercept b_SYMBIONTcrabs b_SYMBIONTshrimp
    <int>      <int> <int>       <dbl>           <dbl>            <dbl>
 1      1          1     1        3.11          -1.02            -1.72 
 2      1          2     2        3.69          -1.96            -3.41 
 3      1          3     3        2.43          -1.94            -3.58 
 4      1          4     4        6.77          -4.01            -4.05 
 5      1          5     5        2.50          -1.01            -0.902
 6      1          6     6        2.28          -3.07            -2.80 
 7      1          7     7        1.95          -0.943           -2.15 
 8      1          8     8        2.74          -2.28            -1.91 
 9      1          9     9        3.16          -0.685           -1.39 
10      1         10    10        4.60          -4.57            -4.24 
# ℹ 1,490 more rows
# ℹ 14 more variables: b_SYMBIONTboth <dbl>, sd_BLOCK__Intercept <dbl>,
#   Intercept <dbl>, `r_BLOCK[1,Intercept]` <dbl>,
#   `r_BLOCK[2,Intercept]` <dbl>, `r_BLOCK[3,Intercept]` <dbl>,
#   `r_BLOCK[4,Intercept]` <dbl>, `r_BLOCK[5,Intercept]` <dbl>,
#   `r_BLOCK[6,Intercept]` <dbl>, `r_BLOCK[7,Intercept]` <dbl>,
#   `r_BLOCK[8,Intercept]` <dbl>, `r_BLOCK[9,Intercept]` <dbl>, …
mckeon.brm3 %>%
  posterior_samples() %>%
  as_tibble()
Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.
# A tibble: 1,500 × 21
   b_Intercept b_SYMBIONTcrabs b_SYMBIONTshrimp b_SYMBIONTboth
         <dbl>           <dbl>            <dbl>          <dbl>
 1        3.11          -1.02            -1.72           -3.90
 2        3.69          -1.96            -3.41           -4.16
 3        2.43          -1.94            -3.58           -3.89
 4        6.77          -4.01            -4.05           -5.93
 5        2.50          -1.01            -0.902          -4.21
 6        2.28          -3.07            -2.80           -4.19
 7        1.95          -0.943           -2.15           -3.18
 8        2.74          -2.28            -1.91           -3.34
 9        3.16          -0.685           -1.39           -2.99
10        4.60          -4.57            -4.24           -4.80
# ℹ 1,490 more rows
# ℹ 17 more variables: sd_BLOCK__Intercept <dbl>, Intercept <dbl>,
#   `r_BLOCK[1,Intercept]` <dbl>, `r_BLOCK[2,Intercept]` <dbl>,
#   `r_BLOCK[3,Intercept]` <dbl>, `r_BLOCK[4,Intercept]` <dbl>,
#   `r_BLOCK[5,Intercept]` <dbl>, `r_BLOCK[6,Intercept]` <dbl>,
#   `r_BLOCK[7,Intercept]` <dbl>, `r_BLOCK[8,Intercept]` <dbl>,
#   `r_BLOCK[9,Intercept]` <dbl>, `r_BLOCK[10,Intercept]` <dbl>, …
mckeon.brm3 %>%
  bayes_R2(re.form = NA, summary = FALSE) %>%
  median_hdci()
          y       ymin      ymax .width .point .interval
1 0.1993422 0.04768985 0.3319253   0.95 median      hdci
mckeon.brm3 %>%
  bayes_R2(re.form = ~ (1 | BLOCK), summary = FALSE) %>%
  median_hdci()
          y      ymin     ymax .width .point .interval
1 0.6524866 0.5188042 0.762788   0.95 median      hdci
## if we had random intercept/slope
mckeon.brm4 %>%
  bayes_R2(re.form = ~ (SYMBIONT | BLOCK), summary = FALSE) %>%
  median_hdci()
          y      ymin      ymax .width .point .interval
1 0.7303833 0.5966863 0.8401578   0.95 median      hdci
mckeon.brm4 |> modelsummary(
  statistic = c("conf.low", "conf.high"),
  shape = term ~ statistic,
  exponentiate = FALSE
)
Warning: 
`modelsummary` uses the `performance` package to extract goodness-of-fit
statistics from models of this class. You can specify the statistics you wish
to compute by supplying a `metrics` argument to `modelsummary`, which will then
push it forward to `performance`. Acceptable values are: "all", "common",
"none", or a character vector of metrics names. For example: `modelsummary(mod,
metrics = c("RMSE", "R2")` Note that some metrics are computationally
expensive. See `?performance::performance` for details.
 This warning appears once per session.
(1)
Est. 2.5 % 97.5 %
b_Intercept 2.484 0.997 4.715
b_SYMBIONTcrabs -1.321 -4.494 2.665
b_SYMBIONTshrimp -1.996 -5.062 1.497
b_SYMBIONTboth -3.009 -6.054 0.103
sd_BLOCK__Intercept 1.491 0.089 4.180
sd_BLOCK__SYMBIONTcrabs 3.820 0.318 15.861
sd_BLOCK__SYMBIONTshrimp 2.834 0.216 11.399
sd_BLOCK__SYMBIONTboth 2.930 0.225 11.877
cor_BLOCK__Intercept__SYMBIONTcrabs 0.344 -0.558 0.891
cor_BLOCK__Intercept__SYMBIONTshrimp 0.305 -0.639 0.883
cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp 0.577 -0.417 0.943
cor_BLOCK__Intercept__SYMBIONTboth 0.235 -0.634 0.864
cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth 0.471 -0.518 0.895
cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth 0.527 -0.560 0.938
Num.Obs. 80
R2 0.730
R2 Marg. 0.200
ICC 0.9
ELPD -24.5
ELPD s.e. 5.7
LOOIC 49.0
LOOIC s.e. 11.3
WAIC 46.7
RMSE 0.20
mckeon.brm4 |> modelplot(exponentiate = FALSE)

10 Further analyses

In addition to comparing each of the symbiont types against the control group of no symbionts, it might be interesting to investigate whether there are any differences between the predation protection provided by crabs and shrimp, as well as whether having both crabs and shrimp symbionts is different to only a single symbiont type.

These contrasts can be explored via specific contrasts.

Table 1: Contrast coefficients
SYMBIONT Crab vs Shrimp One vs Both None vs Symbiont
none 0 0 1
crab 1 1/2 -1/3
shrimp -1 1/2 -1/3
both 0 -1 -1/3
mckeon.brm4 %>%
  emmeans(~SYMBIONT, type = "response") %>%
  pairs()
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
 contrast       odds.ratio lower.HPD upper.HPD
 none / crabs         3.75  0.007100      49.9
 none / shrimp        7.36  0.008196      90.2
 none / both         20.27  0.075297     245.6
 crabs / shrimp       2.00  0.000087      66.7
 crabs / both         6.22  0.007177     217.0
 shrimp / both        3.10  0.002235      51.6

Point estimate displayed: median 
Results are back-transformed from the log odds ratio scale 
HPD interval probability: 0.95 
mckeon.em <-
  mckeon.brm4 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  pairs() %>%
  gather_emmeans_draws() |>
  mutate(.value = exp(.value)) |>
  summarise(median_hdci(.value),
    Pl = mean(.value < 1),
    Pg = mean(.value > 1)
  )
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
## On the fractional scale
mckeon.em <- mckeon.brm4 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  pairs() %>%
  gather_emmeans_draws() %>%
  mutate(
    Eff = exp(.value),
    PEff = 100 * (Eff - 1)
  ) # ,
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
# Prob = plogis(.value))
mckeon.em %>% head()
# A tibble: 6 × 7
# Groups:   contrast [1]
  contrast     .chain .iteration .draw .value    Eff   PEff
  <chr>         <int>      <int> <int>  <dbl>  <dbl>  <dbl>
1 none - crabs     NA         NA     1  4.51  90.9   8995. 
2 none - crabs     NA         NA     2 -0.968  0.380  -62.0
3 none - crabs     NA         NA     3 -1.26   0.285  -71.5
4 none - crabs     NA         NA     4 -2.15   0.116  -88.4
5 none - crabs     NA         NA     5  0.520  1.68    68.3
6 none - crabs     NA         NA     6 -1.69   0.184  -81.6
mckeon.em %>%
  group_by(contrast) %>%
  dplyr::select(contrast, Eff) %>%
  median_hdi()
# A tibble: 56 × 7
   contrast       Eff    .lower .upper .width .point .interval
   <chr>        <dbl>     <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 crabs - both  6.22   0.00742   66.4   0.95 median hdi      
 2 crabs - both  6.22  66.6       92.8   0.95 median hdi      
 3 crabs - both  6.22  95.8      112.    0.95 median hdi      
 4 crabs - both  6.22 115.       128.    0.95 median hdi      
 5 crabs - both  6.22 145.       154.    0.95 median hdi      
 6 crabs - both  6.22 156.       157.    0.95 median hdi      
 7 crabs - both  6.22 166.       172.    0.95 median hdi      
 8 crabs - both  6.22 181.       188.    0.95 median hdi      
 9 crabs - both  6.22 197.       198.    0.95 median hdi      
10 crabs - both  6.22 203.       204.    0.95 median hdi      
# ℹ 46 more rows
mckeon.em %>%
  group_by(contrast) %>%
  summarize(Prob = sum(Eff > 1) / n())
# A tibble: 6 × 2
  contrast        Prob
  <chr>          <dbl>
1 crabs - both   0.85 
2 crabs - shrimp 0.664
3 none - both    0.97 
4 none - crabs   0.76 
5 none - shrimp  0.891
6 shrimp - both  0.749
## On a probability scale
mckeon.em <- mckeon.brm3 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  regrid() %>%
  pairs() %>%
  gather_emmeans_draws() %>%
  mutate(Eff = .value) # ,
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
mckeon.em %>% head()
# A tibble: 6 × 6
# Groups:   contrast [1]
  contrast     .chain .iteration .draw .value    Eff
  <chr>         <int>      <int> <int>  <dbl>  <dbl>
1 none - crabs     NA         NA     1 0.0674 0.0674
2 none - crabs     NA         NA     2 0.126  0.126 
3 none - crabs     NA         NA     3 0.299  0.299 
4 none - crabs     NA         NA     4 0.0583 0.0583
5 none - crabs     NA         NA     5 0.109  0.109 
6 none - crabs     NA         NA     6 0.594  0.594 
mckeon.em %>%
  group_by(contrast) %>%
  dplyr::select(contrast, Eff) %>%
  median_hdi()
# A tibble: 6 × 7
  contrast         Eff   .lower .upper .width .point .interval
  <chr>          <dbl>    <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 crabs - both   0.286 -0.0666   0.673   0.95 median hdi      
2 crabs - shrimp 0.102 -0.269    0.529   0.95 median hdi      
3 none - both    0.594  0.206    0.909   0.95 median hdi      
4 none - crabs   0.265  0.00675  0.619   0.95 median hdi      
5 none - shrimp  0.380  0.0466   0.734   0.95 median hdi      
6 shrimp - both  0.183 -0.219    0.593   0.95 median hdi      
## Cell means
mckeon.em <- emmeans(mckeon.brm3, ~SYMBIONT, type = "link") %>%
  gather_emmeans_draws()
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
mckeon.em %>%
  mutate(P = plogis(.value)) %>%
  median_hdci(P)
# A tibble: 4 × 7
  SYMBIONT     P .lower .upper .width .point .interval
  <fct>    <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 none     0.962 0.801   0.999   0.95 median hdci     
2 crabs    0.680 0.243   0.965   0.95 median hdci     
3 shrimp   0.562 0.144   0.927   0.95 median hdci     
4 both     0.341 0.0304  0.755   0.95 median hdci     
cmat <- cbind(
  "Crab vs shrimp" = c(0, 1, -1, 0),
  "Both vs One" = c(0, -1 / 2, -1 / 2, 1),
  "Any vs None" = c(-1, 1 / 3, 1 / 3, 1 / 3)
)

mckeon.em <- mckeon.brm3 |>
  emmeans(~SYMBIONT, type = "response") |>
  contrast(method = list(cmat))
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
mckeon.em <-
  mckeon.brm3 %>%
  emmeans(~SYMBIONT, type = "link") %>%
  contrast(method = list(cmat)) %>%
  gather_emmeans_draws() %>%
  mutate(Fit = exp(.value)) |>
  summarise(median_hdci(Fit),
    Pl =  mean(Fit < 0),
    Pg =  mean(Fit > 0)
  )
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
newdata <- emmeans(mckeon.brm3, ~SYMBIONT, type = "response") %>% as.data.frame()
Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
head(newdata)
 SYMBIONT      prob lower.HPD upper.HPD
 none     0.9619341 0.8016964 0.9994676
 crabs    0.6799539 0.2676619 0.9788126
 shrimp   0.5620007 0.1439707 0.9267590
 both     0.3414782 0.0303738 0.7551121

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95 
ggplot(newdata, aes(y = prob, x = SYMBIONT)) +
  geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD))

mckeon.brm3 %>% bayes_R2(re.form = NA)
    Estimate  Est.Error       Q2.5     Q97.5
R2 0.1936311 0.07823926 0.02939163 0.3204931
mckeon.brm3 %>%
  bayes_R2(re.form = NA, summary = FALSE) %>%
  median_hdci()
          y       ymin      ymax .width .point .interval
1 0.1993422 0.04768985 0.3319253   0.95 median      hdci
mckeon.brm3 %>%
  bayes_R2(re.form = ~ (1 | BLOCK), summary = FALSE) %>%
  median_hdci()
          y      ymin     ymax .width .point .interval
1 0.6524866 0.5188042 0.762788   0.95 median      hdci
## for random intercept/slope model
mckeon.brm4 %>%
  bayes_R2(re.form = ~ (SYMBIONT | BLOCK), summary = FALSE) %>%
  median_hdci()
          y      ymin      ymax .width .point .interval
1 0.7303833 0.5966863 0.8401578   0.95 median      hdci

11 References

Mckeon, Seabird, Adrian Stier, Shelby Mcilroy, and Benjamin Bolker. 2012. “Multiple Defender Effects: Synergistic Coral Defense by Mutualist Crustaceans.” Oecologia 169 (February): 1095–1103. https://doi.org/10.1007/s00442-012-2275-2.